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Abstract 



Weinberg ([20121 ) described a constructive algorithm for computing the marginal likelihood, 
Z , from a Markov chain simulation of the posterior distribution. Its key point is: the choice of an 
integration subdomain that eliminates subvolumes with poor sampling owing to low tail- values 
of posterior probability. Conversely, this same idea may be used to choose the subdomain that 
optimizes the accuracy of Z. Here, we explore using the simulated distribution to define a small 
region of high posterior probability, followed by a numerical inte gration of the s ample in the 
selected region using the volume tessellation algorithm described in Weinberg] ( 2012 ). Even more 
promising is the resampling of this small region followed by a naive Monte Carlo integration. 
The new enhanced algorithm is computationally trivial and leads to a dramatic improvement in 
accuracy. For example, this application of the new algorithm to a four-component mixture with 
random locations in 16 dimensions yields accurate evaluation of Z with 5% errors. This enables 
Bayes-factor model selection for real-world problems that have been infeasible with previous 
methods. 

Keywords: Baycsian computation, marginal likelihood, algorithm, Bayes factors, model 
selection 



1 Introduction 



Bayesian methods hold the promise of selecting general models with different dimensionality and 
unrelated structure. For example, consider a collection of such models, M = {Mi, M2, ■ ■ ■ M m }, 
proposed to describe the data D. Each model Mj is described by parameter vectors 6j G M dj where 
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dj = dim(#j). Bayes theorem gives the posterior probability density for each model: 

P(Mj)P(Y>\Mj) 



P(Mj\T>) 



P(D) 



where P(M,) is the prior probability of Model j, P(D) is an unknown normalization and 



PfDlM,- 



dOj P{Oj\Mj)P(D\0j,Mj) 



(1) 



(2) 



is the marginal likelihood for Model j. The posterior odds of any two models j, k G [1, m] is then 



P(Mj\D) 
P{M k \D) 



P(M fc ) 



P(D|M?) 
P(D|M fc ) 



(3) 



now independent of the normalization P(D). The first term on the right-hand side describes the 
odds ratio from prior knowledge and the second term is the Bayes factor. In most cases, the set 
of models is given a counting measure; the posterior odds ratio is then equal to the Bayes factor. 
If P(Mj\D) > P(M k \D) for all k G [l,m],k / j, Model j best explains the data out of all the 
proposals in M. 

The Bayes factor requires the computation of the marginal likelihood for each model. Analytic 
computation is almost never possible and direct evaluation by numerical quadrature is almost never 
feasible for models of real- world dimensionality and complexity. This has led to a variety of approx- 
imations based on special properties of the models or their posterior distributions. For example, a 
smooth unimodal distribution that is well-repres ented by a multid i mens ional normal distribution 
can be evaluated by Laplace approximation (e.g. iKass and Raftervl. 119951). T h e dimensionality for 
nested models can be effectively lowered as described in lDiCicio et al.l (119971 ) . IChib and Jeliazkov 
(|200lh describe an efficient approach for models amenable to block sampling. A n umber of astro- 
nc^Tcal problems of current interest (e.g. Ivoor, et all ffl; Er^ul EffiJ do not fit into 
these categories and require expli cit method s. 



Motivated by these problems, IWeinbergl (12012 ) explored the direct use of MCMC samples to 
compute the marginal likelihood and proposed two algorithms. In essence, both use the MCMC 
sample to identify the important regions of parameter space. The first algorithm modifies the 
harmonic-mean approximation to remove the low-probability tail of the distribution that dominates 
the error. However, if the harmonic mean integral itself is improper, as it typically is for problems 
with weakly informative prior distributions, this algorithm will fail. The second algorithm assigns 
probability to a tree partition of the sample space and performs the marginal likelihood integral 
directly. This algorithm is consistent for all (proper) posterior distributions. Additional recent 
applications have suggested a number of important extensions to these ideas, which i s what we 
will ex plore in this paper. We will begin in $2]with a intuitive motivation and review of IWeinberg 
<|2012h . 
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2 The Main Point 



This contribution focuses on a further extension of the second algorithm that dramatically improves 
its feasibility in a va riety of cases. Le t Q be the MCMC sample of the desired posterior probability. 
The central point of Weinberg ( 20121 ) is the following: the marginal likelihood P(D\Mj) is defined 



by 

P(D\Mj) I dP(0j\D)= I 'lOjI'iOj .U,)/'{l)0,,.U ; ) (4) 

where the set Q s C may be chosen to optimize the numerical evaluations of the integrals in 
equation Evaluated by Monte Carlo sampling, the integral on the left-hand side of equation 
([U is simply the fraction of points in Q s relative to the number in f2. The integral on the right- 
hand side of equation ^ is performed by quadrature after the measure is assigned by associating 
tessellated volume elements v(uji) in R d J to each point or group of points Wj in Q. The unity of 
all subvolumes Vi = v(ui) in the tessellation is a convex hull in M. d j . By construction, the MCMC 
algorithm provides samples such that J u d6jP(6j\D) ~ P{6\D)vi ~ constant for some 6 € cjj. 
Therefore, relatively small values of P(0) will be associated with relatively large values Vi, and, 
therefore, will contribute most of the variance to the resulting quadrature on the right-hand side 
of equation (j4]). 

This motivates seeking subsets Q s C 0, that preserve the measure defined by the tessellation that 
minimizes the variance of equation (JH). A particular solution is intuitively obvious: successively peel 
the subvolumes on the hull until the volumes Vi are sufficiently small that P{6j\D) varies slowly 
across Vi while preserving the condition \£l s \ ^> 1 so that the error in the integral on the left-hand 
side of equation remains smalQ. Below, we explore three extensions to this approach whose 
goals are optimizing the choice of Q s to accurately and efficiently evaluate equation @ . In §3. 1 1 we 
try the peeling approach. In §3.21 we identify an easy to tessellate volume near the posterior mode 
containing the subset oj s and retessellate this volume. In addition, rather than using the original 
MCMC sample, the new subvolume identified from the sample can be resampled with a new more 
efficient sampling function, improving the results. 

These proposed extensions do not circumvent the curse of dimensionality ( Bellman . 20031 ): we 



still are required to sample a high-dimensional space. However, the approach described above 
allows us to choose a domain that results in the most accurate integral with the smallest number 
of samples. 
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Figure 1: Values of log Z obtained with the volume-peeling strategy compared with both volume 
tessellation algorithms for a chain of three million states. The lower axis (upper axis) shows the 
enclosed volume fraction (number of points) in the 'peeled' subvolume. The Laplace approximation 
applied to a subset of the entire sample (using the upper axis) is shown for comparison. The true 
value is log Z = 0. 
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3 Using MCMC for importance sampling choice of the subdomain 



3.1 Volume peeling 

We tried two implementations of volume peeling. Both define the geometric center of mass and 
slowly decrease the scale of the self-similar volume in MrJ . In the first implementation, We elim- 
inated all subvolumes V{ outside of or containing the boundary of the scaled self-similar volume. 
In the second, we computed the set £l s contained within the new boundary and recomputed the 
tessellation. The latter is slightly more accurate than the former but requires a new tessellation. 
We adjust the scale factor to retain a fixed number of points N, and, therefore, limit the error in 
the left-hand side of equation @ to a root variance of approximately 1 /y/~N. 

The results of applying the volume peeling strategy to a Markov-chain-sampled normal distri- 
bution in twelve dimensions is illustrated in Figure [H For the tessellation, we use a k-d tree with 
a round-robin geometric bisection, sometimes known as the orthogonal recursive bisection (ORB) 
tree. This tree recursively subdivides the cells into equal subvolumes, one dimension at a time. The 
recursion stops when the next division produces cells with occupation numbers below the target 
size c. This algorithm prevents cells with extreme axis ratios but the cells will not have the same 
number of counts. For this test, we adopt a target cell count of c = 8. The figure shows both the 
Riemann (V TAj ) and L ebesg ue (VTA2) variants for estimating the integrals in equation as in 



described in IWeinbergl (120121 ). The Riemann computation uses the median value of the posterior 



probability in each cell to compute the contribution to the Riemann integral: 

I d6 P(0|D) « v ( UJ j) median{P(0 fc |D)} (5) 

where 9 k £ uj and v(uij) is the hypervolume in the subdomain j (i.e. the cell). The Lebesgue 
computation assigns the hypervolume by probability according to the monotone function f(P) as 
follows: 



Jp>P k ;k^ 3 Ml keu . 



1 if P > P k+l 

f(pZ)-fik) else if > P > P fc , (6) 
otherwise 



where we assume that Pi > P k if / > k. For the computations here, we choose f(x) = x. The 
Lebesgue integral is then approximated as 

j dGP{e\T>)^ l -{P k+l -P k ) fj>(w fc+1 ) + J>(o; fc )j • 

The Riemann and Lebesgue constructions differ, even in the limit c = 1. 



x We denote the cardinality of Set 5* by 15*1 
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Figure Q] demonstrates that the choice of an appropriate subdomain results in acceptable ap- 
proximations to the marginal likelihood. In all cases, the sparsely sampled tails of the distribution 
coupled with regions of empty volume bias the evaluation of Z upward. As the volume is peeled 
from the outside, the offset from the true value Z = 1 (logZ = 0) decreases dramatically, as ex- 
pected. All evaluations are for a fixed chain size. Therefore, larger values of c imply lower spatial 
resolution. In addition, as c increases and the cell width approaches the characteristic scale of the 
posterior distribution, the variance of P in the cell increases and the approximations in equations 
© and (|6l) become inaccurate. On the other hand, a small volume fraction constrains the sampled 
region to the peak of the distribution. Near the peak, the variance of P decreases and accuracy 
is recovered. This explains the shift to smaller volume fraction required to obtain accuracy with 
higher c in Figure [H Both variants are biased in the limit of small volume fractions; the Lebesgue 
(Riemann) method is biased high (low). The Lebesgue (Riemann) variant with c > 8 (c = 1) has 
the smallest bias. The Lebesgue variant has the lowest bias overall. 

The underestimation of Z by the Riemann algorithm makes intuitive sense; the distribution of 
probability values in each cell will be exponentially skewed toward higher values. Using the median 
value as the representative value for the cell will tend to underestimate the cell's true contribution. 
The overestimation of Z by the Lebesgue owes to the linear assignment of probability to volume 
in the measure function (our choice of f(x) in eq. [6]); the true value of f(x) is likely to be some 
convex-up function. 

We show the Laplace approximation for comparison. We use the same number of enclosed points 
implied by the volume fraction for the Riemann and Lebesgue variants but randomly sampled from 
the entire chain with no volume restriction. As expected, the Laplace approximation converges to 
the true value for a large sample from a multivariate normal distribution. Similar results obtain 
for lower and higher dimensionality. We have checked this up to d = 16. 

Applying the Lebesgue variant of the volume tessellation algorithm (VTA) to an appropriately 
'peeled' volume for a unimodal distribution provides a usefully accurate evaluation of the marginal 
likelihood. Now imagine a bimodal distribution with two widely separated modes. The volume 
peeling strategy will fail miserably: the central fractional volume will tend to sit in the poorly 
sampled desert between the two modes. An adaptive approach is needed. 

3.2 Identification of the posterior mode 

The previous family of subdomain selections relies purely on the shape of the enclosing volume. 
In general, this algorithm will not center the subdomain on the shallowest part of the posterior 
distribution. For a worst-case counterexample, consider two equally shaped but widely separated 
spherical modes in parameter space. The algorithm in §3.11 will eliminate the peaks and retain the 
tails of both modes. This leads to a worse estimate than the original estimate based on the full 
posterior sample. 

This counterexample suggests the following alternative approach: 



G 



1. Set the center to the location of the parameter point Ok with the maximum value of the 
posterior probability: P(0 fc |D) > -P(0j|D) for all j / k,j G [1, . . . , N] 

2. Compute the shape of the hyperrectangle initially from the parameter ranges of the entire 
sample: oq t = max{#i r , . . . , O^r} — vam{0\ r , . . . , Ont}- 

3. Let q count the number of iterations and set q = to start. 

4. Compute the distances of the sample from 6^: d 2 - = ^2 r (0j r — &kr) 2 / a q r 

th 

5. Let d be the M distance in the sorted list of distances. Choose M large enough to achieve 
a sufficiently small variance in d (e.g. M = 10 3 ). The enclosing hyperrectangle now has the 
coordinates O m in,r = Okr - &0rd and B max ,r = &kr + &0rd- 

6. Increment q and recompute the shape of the hyperrectangle from the variance of the entire 
sample, a 2 qr = Y^jtfjr ~ &kr) 2 for Oj G [0 min ,6 max \. Repeat Steps SHSJ 

7. Steps HHS may be iterated until converged, if desired. 

8. Finally, tessellate the volume defined by these M points and compute the right-hand side of 
equation Q 

This algorithm significantly improves the marginal likelihood computations, resulting in errors 
in the log of the marginal likelihood of several tens of percent, i.e. |<51og P(D\Mj)\ < 0.2. However, 
the bias in these estimates appears to decrease slowly with sample size. This bias appears to result 
from large changes in P(Qj\D) across the subvolumes. To test this speculation, we replaced Step M 
in the algorithm listed in §3.21 by a uniform resampling of the hyperrectangle. A uniform sampling 
over the volume prevents the volume of the cells growing as the probability value decreases and 
results in a strong suppression of the bias. A sampling function with less variance than uniform 
over the sample volume might lead to better accuracy while still suppressing the bias, but we have 
not investigated this possibility. 

3.3 Test problems 

Here, the original and new algorithms are applied to the following four test distributions constructed 
from one or more normal components with a variance a 1 = 0.003 and centered in the unit hypercube 
of dimensionality d. The centers for the test distributions are as follows: 

1. A single distribution with center x = (0.5, . . . , 0.5). 

2. Two separated distributions with centers x\ = (0.2, 0.2, 0.5, . . . , 0.5) and %2 = (0.8, 0.8, 0.5, . . . , 
0.5) with corresponding weights w\ = 0.6, W2 = 0.4. This distribution simulates two widely 
separated modes. Figure [2] (left) shows the distribution marginalized in all but the first two 
dimensions for d = 8 and for a MCMC-generated sample of 10 5 points. 
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Two overlapping distributions with centers x\ = (0.4, 0.4, 0.5, . . . , 0.5) and x<i = (0.6, 0.6, 0.5, . . . 
0.5) with corresponding weights w\ = 0.6, W2 = 0.4. This distribution a distribution with 
two local maxima on a common pedestal. Figure [2] (right) shows shows the distribution 
marginalized in all but the first two dimensions for d = 8 and 10 5 MCMC points. 

Four randomly- oriented distributions with their centers uniformly selected from the hypercube 
[0.5 — 2a, 0.5 + 2a] d at random with Dirichlet distributed weights and shape parameter a = 1. 
This produces an asymmetric distribution with multiple maxima connected by "necks" of 
varying amplitude an d emulate s featu res of posterior distributions from parametric models 
found in practice (e.g. Lu et al. . 20121 ). Figure [3] shows the distribution in pairs of marginal 
variables for d = 8. 




(a) separated (b) overlapped 

Figure 2: Marginalization of two-component Gaussian distribution in 8 dimensions using 10 5 
MCMC samples for Test Distribution 2 (left) and Test Distribution (3) (right). The contour levels 
are 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99, 0.995, 0.999. 

Table Q] summarizes the ma rginal likel i hood evaluation using the original VTA algorithm with 



the ORB tree as described in IWeinberd (]2012l ) with volume trimming. In all cases, the input 
chain has 2 x 10 5 states. The first group of columns lists the model, from the list above, and 
dimensionality. The second group of columns lists log Z using the entire MCMC sample from Q 
for the Riemann (VTAi), Lebesgue (VTA2) evaluations using volume tessellation and the Laplace 
method. For all but the lowest dimensionality, the resulting value of Z is biased upward as for 
reasons discussed previously. This is designed for comparison with the third and fourth groups of 
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Table 1: Error in marginal likelihood values: original algorithm 



Model 




Error in log Z 






100% Volume, 


c = 8 


0.1% volume, c = 8 


0.1% volume, c = 16 


Type 


d 


VTAi_2 


VTA^ 


Laplace 


VTAi 


VTA 2 


VTAi 


VTA 2 




4 


2.68 


0.55 


0.00 


-3.11 


-0.07 


-3.21 


-0.08 


1. Single 


8 


6.38 


5.03 


-0.08 


-2.56 


0.06 


-2.70 


0.01 




12 


9.33 


10.24 


-0.29 


-3.49 


0.29 


-1.59 


1.13 




16 


14.10 


11.58 


0.66 


13.22 


15.59 


12.75 


14.96 




4 


10.46 


9.10 


9.33 


oo 


oo 


oo 


oo 


2. Separated 


8 


10.78 


9.73 


8.40 


oo 


oo 


oo 


oo 




12 


11.07 


8.70 


1.12 


oo 


oo 


oo 


oo 




16 


13.00 


10.51 


3.49 


oo 


oo 


oo 


oo 




4 


3.08 


0.76 


0.49 


-4.60 


-0.03 


-3.55 


-0.13 


3. Overlapped 


8 


6.51 


5.35 


0.12 


-3.35 


0.29 


-3.55 


0.16 




12 


9.44 


10.11 


-0.71 


-3.61 


0.40 


-3.78 


0.20 




16 


12.65 


13.87 


-0.64 


5.36 


6.38 


14.25 


15.76 




4 


-2.50 


-0.01 


4.76 


-2.51 


0.02 


-4.26 


0.12 


4. Random 


8 


-1.37 


0.67 


4.41 


-1.13 


0.92 


-1.92 


0.99 




12 


1.37 


3.81 


4.40 


7.99 


9.79 


-1.24 


2.42 




16 


-15.76 


16.57 


4.67 


14.91 


16.20 


14.33 


17.06 



"Riemann evaluation 
'Lebesgue evaluation 



Table 2: Error in marginal likelihood values: important region 



1V1UUC1 




Error in logZ 






Subregion 


Resampled 






VTA i 


VTAn 


\/ QQTl 

IVlfcJdll 


VTA i 


VTAo 


IVlccLLl 




A 


—0.090 


—0.093 


-0 0092 


—0.014 


—0.023 


—0.016 




Q 
o 


-0.667 


-1.097 


-0.676 


-0.081 


-0.121 


-0.0954 




12 


-0.746 


-1.178 


-0.795 


0.056 


-0.021 


-0.018 




16 


-0.43 


-1.08 


-0.48 


-0.13 


-0.12 


-0.09 




4 


-0.472 


-0.484 


-0.475 


-0.040 


-0.032 


-0.036 


2. Separated 


8 


-0.710 


-1.089 


-0.703 


-0.115 


-0.163 


-0.140 




12 


-0.667 


-1.41 


-0.717 


-0.121 


-0.200 


-0.139 




16 


-0.702 


-0.702 


-0.571 


0.027 


-0.161 


-0.141 




4 


-0.389 


-0.389 


-0.391 


0.108 


0.102 


0.105 


3. Overlapped 


8 


-0.002 


-0.076 


-0.020 


0.075 


0.016 


0.053 




12 


-0.4451 


-0.548 


-0.515 


0.070 


-0.021 


0.0178 




16 


-0.455 


-0.846 


-0.520 


0.200 


-0.031 


0.048 




4 


-0.460 


-0.466 


-0.463 


0.099 


0.044 


0.044 


4. Random 


8 


-0.456 


-0.502 


-0.499 


0.021 


0.044 


0.044 




12 


-0.595 


-0.704 


-0.628 


0.099 


-0.081 


-0.080 




16 


-0.595 


-0.740 


-0.628 


0.129 


0.069 


0.070 



columns computed for the fraction 0.001 of the original tessellated volume and c = 8 and c = 16, 
respectively. 

Most notably, the model with two separated normal distributions has no samples in the sub- 
volume between the two modes. For the other three cases, the trimmed volume leads to improved 
accuracy. In most cases, the Lebesgue evaluation (VTA2) significantly outperforms the Riemann 
evaluation (VTA2) owing to the extra information about the distribution of posterior probability 
values in each cell. In all cases, the posterior distribution is undersampled for d = 12 and d = 16 
leading to poor results, with some errors greater than 14 in the log! Even when the volume trim- 
ming is centered on the posterior mode as it is for Test Distribution 1, the sparse sampling of the 
mode for d = 12, 16 and fixed sample size yields large inaccuracies. 

Contrast the results from Tabled] with those from Table [2j which illustrates the effect of iden- 
tifying a subvolume around the peak posterior value as described in §3.21 with M = 1000. This 
choice implies a relative sampling error of l/\/1000 = 0.03. The posterior samples are identical in 
both cases. In Table [2l the second group of columns (identified by 'Subregion') lists the errors in 
log Z evaluated by retessellating the M important points and the third group of columns (identified 
by 'Resampled') lists the errors in log Z evaluated by resampling the important region uniformly 
in each dimension using 3 x 10 5 points. The column labeled 'Mean' denotes the average value 
of the points in the important region multiplied by the volume of the important region. This is 
equivalent to the naive Monte Carlo evaluation of Z for the resampled case, which we show for the 
retessellated case for comparison. 

In nearly all cases, the new algorithm outperforms the original one, and significantly so for high 
values of d. The accuracy of the resampled case is remarkable: the computed values are within 25% 
of the exact value in all cases and much smaller in most cases. No significant differences are found 
between the Riemann, Lebesgue, and naive Monte Carlo evaluations, owing to the slowing varying 
values of the posterior probability P(D\0j, Aij) across the important region. For the subregion 
evaluation based on the MCMC posterior sample, the results are clearly worse with O(oo) errors, but 
are still remarkably better for high dimensionality. For lower dimensionality, the original algorithm 
with volume trimmer out performs the new algorithm where the subregion contains enough points 
to make an evaluation possible. 

In summary, using the MCMC posterior sample to identify an important region around a domi- 
nant mode and resampling this region to evaluate the integral on the right-hand side of equation (JJ|) 
yields accurate results with a modest number of evaluations for dimensionality d < 16. Moreover, 
using the MCMC chain to evaluate the integral on the left-hand side of equation (j4]) and the naive 
Monte Carlo evaluation on the right-hand side yields an accurate result without the more elaborate 
volume tessellation. These results obtain even for a random distribution of four connected modes 
(see Fig. ED. 
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4 Summary and Discussion 



Markov chain Monte Carlo sampling of posterior distributions is a common tool in Bayesian infer- 
ence, particularly for parameter estimation. These samples, often obtained for suites of competing 
complex models often require a substantial investment in computational resources. This motivates 
reusing the sample whenever possible. For example, it would be wonderful to exploit Bayesian 
model selection techniques, Bayes factors in particular, without a new costly computational cam- 
paign or resorting to often in accurate approxi mations. 

Motivated by this desire, IWeinbergl (|2012l ) presented two algorithms for computing the Bayes 
normalization or marginal likelihood value using MCMC-sampled posterior distributions. The main 
point of that paper was that the numerical evaluation of the integrals in equation ^) that defines 
the normalization P(D\M) can be performed for a subdomain !] s C 1! and equation @ still holds. 
Our current paper suggests using an appropriately defined subset to eliminate volume elements 
with large errors. The main point is that a subsample with |fl a | <C but centered around a 
posterior mode renders the integrals in equation ([3]) as accurately as possible. 

Moreover, and much to our amazement, by resampling the small subdomain tt s , accurate values 
for the normalization P(D|A4) are obtained by estimating the fraction of the sample in f2 s using 
the original MCMC-generated posterior sample to evaluate the integral on the left-hand side of 
equation (J3]) and using a uniform resampling of volume defined by £l s followed by a naive Monte 
Carlo integration estimate of the right-ha nd side of equati on @. This may be done without the 



(|2012h ! 



elaborate tessellation algorithm defined in IWeinberg 

The sample sizes required are still bound by the curse of dimensionality. In particular, the 
evaluation of the left-hand side is a counting process with M points and the Poisson error is 
proportional to 1/y/M. An accurate evaluation of the right-hand side requires that the posterior 
probability be as uniform as possible in the subvolume. This requires an ever larger posterior 
sample as the dimensionality d increases. Fortunately, this condition can be easily diagnosed as 
part of the computation. 

This work suggests a number of future improvements. For example, the error analysis could be 
automated by using a stopping criterion for the initial posterior sample selection that enforces a 
predefined number of samples M in a volume with max{P(0|D)}/ min{P(0|D)} < L with L chosen 
such that the integral will converge quickly by cubature once the "core" of the posterior mode is 
reached. Other possible avenues include using sampling by quasi-random numbers or importance 
sampling based on the covariance matrix of the samples in the subdomain. A paper currently in 
preparation, will describe the application of this algorithm to ma rginal likelihood values used to 
classify astronomical images (jYoon et al.l . l201ll ; lYoon et al.l . l2013bl lal) . 
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Figure 3: Marginalized distribution of the four-component Gaussian distribution with randomly 
chosen centers in 8 dimensions using 10 5 MCMC samples for Test Distribution 4. The four panels 
describe different projections as indicated by the axis labels. Contour levels are as described in Fig. 
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